Simplified Tool Call Accuracy V1 #40710

ghyadav · 2025-04-24T16:33:30Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Copilot

Pull Request Overview

This PR introduces a simplified mechanism for predicting and evaluating tool call accuracy by adding a new tool call predictor module and a corresponding evaluator.

Added the predict_tools function in _tool_call_predictor.py to generate ground truth based on a prompt model.
Exposed predict_tools via an init.py file for easier consumption.
Included a new evaluator in _tool_accuracy_new.py that implements scoring logic for tool calls and tool results.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File	Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_calls_predictor/_tool_call_predictor.py	Implements the tool call predictor using prompt models (contains duplicate imports that could lead to ambiguity).
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_calls_predictor/init.py	Exposes the predict_tools function for the caller.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy_new.py	Introduces new evaluation logic for tool call accuracy including extraction and match scoring.

Files not reviewed (1)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_calls_predictor/tool_call_predictor.prompty: Language not supported

Comments suppressed due to low confidence (2)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_calls_predictor/_tool_call_predictor.py:29

Duplicate import of AsyncPrompty detected (also imported on line 10). Consider consolidating the imports to a single source to avoid ambiguity.

from promptflow.core import AsyncPrompty

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy_new.py:151

The function 'generate_ground_truth' is referenced but not defined or imported; please add its definition or import it.

ground_truth = generate_ground_truth(query, response, tool_definitions)

azure-sdk · 2025-04-24T16:53:40Z

API change check

API changes are not detected in this pull request.

singankit · 2025-04-24T17:16:03Z

@ghyadav Can you please share context on requirements for a new tool call accuracy evaluator ?

github-actions · 2025-06-27T05:01:36Z

Hi @ghyadav. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

github-actions · 2025-07-04T08:38:54Z

Hi @ghyadav. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.

Simplified Tool Call Accuracy V1

8a922bb

Copilot AI review requested due to automatic review settings April 24, 2025 16:33

ghyadav requested a review from a team as a code owner April 24, 2025 16:33

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Apr 24, 2025

Copilot AI reviewed Apr 24, 2025

View reviewed changes

github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Jun 27, 2025

github-actions bot closed this Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplified Tool Call Accuracy V1 #40710

Simplified Tool Call Accuracy V1 #40710

Uh oh!

ghyadav commented Apr 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

azure-sdk commented Apr 24, 2025

Uh oh!

singankit commented Apr 24, 2025

Uh oh!

github-actions bot commented Jun 27, 2025

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

Uh oh!

Simplified Tool Call Accuracy V1 #40710

Simplified Tool Call Accuracy V1 #40710

Uh oh!

Conversation

ghyadav commented Apr 24, 2025

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

azure-sdk commented Apr 24, 2025

Uh oh!

singankit commented Apr 24, 2025

Uh oh!

github-actions bot commented Jun 27, 2025

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

Uh oh!